Skip to content

feat(fetch): use HTTP content negotiation to request native markdown#4052

Open
sidney wants to merge 1 commit intomodelcontextprotocol:mainfrom
sidney:feat/fetch-content-negotiation
Open

feat(fetch): use HTTP content negotiation to request native markdown#4052
sidney wants to merge 1 commit intomodelcontextprotocol:mainfrom
sidney:feat/fetch-content-negotiation

Conversation

@sidney
Copy link
Copy Markdown

@sidney sidney commented Apr 26, 2026

Send Accept: text/markdown, text/html;q=0.9, */*;q=0.8 with each fetch request and pass the body through unchanged when the server responds with Content-Type: text/markdown.

Description

Adds detection of servers that implement text/markdown, fast-track getting the Markdown content they send natively.

Fully backwards compatible. No change to processing from servers that don't send Markdown.

fetch_url now advertises a Markdown preference via the Accept header. When the server respond with Content-Type: text/markdown (with or without a charset parameter), the body is returned as-is, skipping the readability + markdownify pipeline. Anything else falls through to the existing extraction unchanged.

Strict media-type matching: only text/markdown triggers the short-circuit. text/markdown; charset=utf-8 qualifies; non-standard variants like text/x-markdown do not. There's a test pinning this contract.

Server Details

  • Server: fetch
  • Changes to: tools (the fetch tool's underlying request and response handling)

Motivation and Context

Some servers can deliver native Markdown directly — Cloudflare zones with Markdown for Agents enabled, content-negotiating CMSes, raw-content endpoints, and so on. Today mcp-server-fetch ignores that and runs every response through readability + markdownify, which is lossy and unnecessary when the server is already offering Markdown.

Cloudflare's own measurements report ~80% token reduction compared to the equivalent HTML — their docs blog post is 16,180 tokens as HTML vs 3,150 tokens as Markdown. Even outside the Cloudflare case, any site that already speaks text/markdown benefits from the body being passed through unmodified rather than round-tripped through HTML extraction.

Accept is a hint: servers that don't perform content negotiation simply respond with whatever they always would, so the change is fully backwards-compatible.

How Has This Been Tested?

Unit tests — existing TestFetchUrl tests pass unchanged. Four new tests appended:

  • test_fetch_markdown_returns_earlytext/markdown body returned as-is, no extraction, empty prefix.
  • test_fetch_markdown_with_charsettext/markdown; charset=utf-8 qualifies.
  • test_fetch_x_markdown_does_not_matchtext/x-markdown falls through to the raw-fallback branch (pins the strict-matching contract).
  • test_fetch_sends_accept_header — verifies the Accept header is sent and that markdown is preferred over HTML in the q-list.

MCP Inspectornpx @modelcontextprotocol/inspector uv run mcp-server-fetch. Verified two scenarios:

  1. Fast-path: https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ (a Markdown-for-Agents-enabled origin) returns native markdown directly. Same URL without the Accept header serves HTML — easy to confirm with curl -sI.
  2. Existing path unchanged: https://example.com/ flows through readability + markdownify and produces the expected structured markdown.

LLM client — patched mcp-server-fetch installed in Claude Desktop, exercised against both URLs above. Fast-path returns clean markdown with a YAML frontmatter (Cloudflare's edge-converted output); readability path returns the existing structured extraction.

Breaking Changes

None. The change is fully backwards-compatible — servers that don't perform content negotiation ignore the Accept header and respond as they always would. Users do not need to update MCP client configurations.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Protocol Documentation
  • My changes follows MCP security best practices
  • I have updated the server's README accordingly
  • I have tested this with an LLM client
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • [N/A] I have added appropriate error handling
  • [N/A] I have documented all environment variables and configuration options

(Note on the last two: no new error paths or environment/configuration options are introduced — the fast-path inherits the existing GET call's error handling, and the only added logic is a Content-Type string match.)

Additional context

Cloudflare's Markdown for Agents documentation page is itself served from a Markdown-for-Agents-enabled origin, so it works as a live test target:

$ curl -sI -H "Accept: text/markdown" \
    https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
    | grep -i content-type
content-type: text/markdown; charset=utf-8

$ curl -sI \
    https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
    | grep -i content-type
content-type: text/html; charset=utf-8

Same URL, two different Content-Types depending on the Accept header.

The raw=True flag is unaffected: when a server returns text/markdown, the fast-path returns the raw body (which is what raw=True would also produce); when a server returns HTML, raw=True continues to short-circuit to the raw HTML branch as before.

Send Accept: text/markdown, text/html;q=0.9, */*;q=0.8 with each fetch
request. When the server responds with Content-Type: text/markdown (with
or without a charset parameter), return the body as-is, skipping the
readability + markdownify extraction. Otherwise the existing pipeline
runs unchanged.

Servers that don't perform content negotiation ignore the Accept header
and respond as they always would, so this is fully backwards-compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant